LCORE-2310: Added non-streaming agent query utilities#1880
Conversation
WalkthroughAdds non-streaming agent helpers to map inference errors to LCS error responses, extract token usage, build TurnSummary from agent runs (including tool calls), and orchestrate non-streaming agent execution with moderation handling. Unit tests exercise success, moderation-block, and error mappings. ChangesNon-streaming agent query helpers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/agents/query.py`:
- Around line 298-300: Update the Raises section of the docstring in
src/utils/agents/query.py to use clear, grammatical wording: replace the phrase
"On moderation is not applicable" with something like "HTTPException: If
moderation is not applicable for the input" and keep the second clause as
"HTTPException: on agent or provider failure" (or combine into a single line
that reads "HTTPException: If moderation is not applicable for the input, or on
agent/provider failure") so the meaning is explicit and grammatical; edit the
docstring where the Raises block mentions HTTPException to reflect this clearer
phrasing.
- Around line 83-98: Update the docstring to match the implementation: clarify
that non-context-length RuntimeError exceptions are converted to a generic
internal error response (InternalServerErrorResponse.generic()) rather than
re-raised. In the exception-handling function that checks AgentRunError,
APIStatusError, APIConnectionError and RuntimeError via
is_context_length_error(), change the docstring text to state that
non-context-length RuntimeError values return
InternalServerErrorResponse.generic() (or list the actual behavior), referencing
the RuntimeError branch and InternalServerErrorResponse.generic() so future
readers see the implemented behavior.
- Around line 149-152: In get_agent_finish_reason, avoid directly constructing
AgentFinishReason(response.finish_reason) because response.finish_reason can be
"tool_call" (or other unknown values) which will raise ValueError; update the
function to either explicitly check for "tool_call" on response.finish_reason
and return AgentFinishReason.ERROR, or wrap the enum conversion in a try/except
ValueError that returns AgentFinishReason.ERROR as a safe fallback, referencing
response.finish_reason and AgentFinishReason to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 193e0be0-68c0-4c67-b19f-e11b7be241e0
📒 Files selected for processing (2)
src/utils/agents/query.pytests/unit/utils/agents/test_query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: integration_tests (3.12)
- GitHub Check: mypy
- GitHub Check: Pyright
- GitHub Check: spectral
- GitHub Check: unit_tests (3.13)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/utils/agents/test_query.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/utils/agents/query.py
🔇 Additional comments (12)
src/utils/agents/query.py (7)
1-57: LGTM!
60-68: LGTM!
101-135: LGTM!
154-182: LGTM!
184-219: LGTM!
222-276: LGTM!
315-315: Avoid unsafe cast: ensureresponses_params.inputisstrbeforecast(str, responses_params.input)
ResponseInputisstr | list[ResponseItem](src/models/common/responses/types.py), socast(str, ...)is only safe if this non-streaming path guarantees string input earlier; add anisinstance(..., str)guard (or normalize list input) before callingawait agent.run(...).tests/unit/utils/agents/test_query.py (5)
1-131: LGTM!
133-181: LGTM!
183-243: LGTM!
358-487: LGTM!
276-290: Ensure the test actually exercisesModelResponse.native_tool_calls:build_turn_summary_from_agent_runiteratesmessage.native_tool_calls(src/utils/agents/query.pylines 262-264), but the test buildsModelResponseby placingNativeToolCallPart/NativeToolReturnPartintoparts(tests/unit/utils/agents/test_query.pylines 276-290). This only tests the native-tools path ifModelResponse.native_tool_callsis populated fromparts; otherwise, the test should construct/setnative_tool_callsexplicitly (or assert behavior that depends on native-tool processing).
| if response.finish_reason is None: | ||
| return AgentFinishReason.ERROR | ||
| return AgentFinishReason(response.finish_reason) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check pydantic-ai FinishReason definition to confirm possible values
rg -n "FinishReason" --type py -A 5Repository: lightspeed-core/lightspeed-stack
Length of output: 7310
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant enum + get_agent_finish_reason implementation
sed -n '40,180p' src/utils/agents/query.py
# Show the unit tests for get_agent_finish_reason
sed -n '120,220p' tests/unit/utils/agents/test_query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 8661
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate imports/definitions around ModelResponse / FinishReason usage
rg -n "from .* import .*ModelResponse|ModelResponse" src/utils/agents/query.py src/utils -S
# Find how finish_reason is typed/used elsewhere
rg -n "\"tool_calls\"|\"tool_output\"|finish_reason\s*:" src -S
# Show utils.pydantic_ai module (likely where ModelResponse/FinishReason are re-exported)
fd -a -t f "pydantic_ai.py" src/utils -x echo
fd -a -t f "pydantic_ai*" src/utils -x echoRepository: lightspeed-core/lightspeed-stack
Length of output: 1254
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect utils/pydantic_ai.py for re-exports/types that define ModelResponse / FinishReason
sed -n '1,220p' src/utils/pydantic_ai.py
# Find any references to "finish_reason" across the repo
rg -n "finish_reason" src -S
# Inspect pydantic-ai ModelResponse type hints only if vendored/aliased in this repo;
# otherwise, we can at least see whether code ever expects tool_calls/tool_output.
rg -n "\"tool_calls\"|\"tool_output\"" src -SRepository: lightspeed-core/lightspeed-stack
Length of output: 5568
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
import inspect
import typing
try:
import pydantic_ai
from pydantic_ai.messages import ModelResponse
except Exception as e:
print("Failed to import pydantic_ai / ModelResponse:", e)
raise
print("pydantic_ai version:", getattr(pydantic_ai, "__version__", "unknown"))
print("ModelResponse:", ModelResponse)
# Print annotated fields for ModelResponse
ann = getattr(ModelResponse, "__annotations__", None)
print("ModelResponse.__annotations__ keys:", sorted(list(ann.keys())) if ann else None)
# Try to find finish_reason field annotation
finish_ann = None
if ann and "finish_reason" in ann:
finish_ann = ann["finish_reason"]
print("finish_reason annotation:", finish_ann)
# Try to resolve/inspect any enums/literals involved in finish_reason
try:
from pydantic_ai.messages import FinishReason
print("FinishReason:", FinishReason)
print("FinishReason members/values:")
if hasattr(FinishReason, "__members__"):
print(list(FinishReason.__members__.items())[:50])
else:
print("Non-enum type:", FinishReason)
except Exception as e:
print("Could not import FinishReason from pydantic_ai.messages:", e)
# If finish_reason is a Literal, show its args
if finish_ann is not None:
origin = typing.get_origin(finish_ann)
args = typing.get_args(finish_ann)
print("finish_reason origin:", origin)
print("finish_reason args (first 30):", list(args)[:30])
PYRepository: lightspeed-core/lightspeed-stack
Length of output: 294
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find dependency pins for pydantic-ai / pydantic_ai
ls -la
fd -a -t f "pyproject.toml|requirements*.txt|uv.lock|poetry.lock|Pipfile.lock" . 2>/dev/null || true
rg -n "pydantic[-_]ai|pydantic_ai" -S pyproject.toml requirements*.txt poetry.lock uv.lock Pipfile.lock 2>/dev/null || true
# Also show the relevant dependency section if pyproject exists
if [ -f pyproject.toml ]; then
sed -n '1,200p' pyproject.toml
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 9409
🌐 Web query:
pydantic-ai 1.105.0 FinishReason tool_calls tool_output
💡 Result:
In Pydantic AI, FinishReason, tool_calls, and tool_output are core concepts related to how an agent manages model responses and tool execution. FinishReason The FinishReason is an enumeration that describes why a model stopped generating a response, normalized to OpenTelemetry values [1]. It includes the following possible values: 'stop', 'length', 'content_filter', 'tool_call', and 'error' [1]. Developers can use the finish_reason attribute on a ModelResponse to inspect why a model ended its generation, for instance, to implement fallback logic if a response was truncated due to length limits or a content filter [2]. Tool Calls and Tool Output Pydantic AI distinguishes between function tools (which perform actions) and output tools (which produce the final result of a run) [3][4]. 1. Tool Calls: When a model decides to use a tool, it emits a tool call. If the model emits a tool call, the finish_reason for that response is typically 'tool_call' [1][5]. 2. Tool Output: This is the data returned after executing a tool. In the default "Tool Output" mode, the JSON schema for output types is provided to the model as a special output tool [6][7]. 3. Managing Execution: Recent updates, such as those in PR #5339 (which influenced behavior leading up to v1.105.0), refined how these are handled [8][9]. Specifically: - The end_strategy parameter (e.g., 'graceful', 'exhaustive') controls the execution flow when a model emits both function tools and an output tool in the same response [4][7]. - 'Graceful' ensures function tools are executed even if a final result is found, while 'exhaustive' ensures all tools—including output tools—are executed [4]. - ToolOutput can be used as a marker class to customize output tools [6][7]. Pydantic AI v1.105.0 was released on June 2, 2026, and focused on features like on-demand loading and Grok model support [8]. The concepts of FinishReason and tool management remain central to the framework's operational flow [1][6].
Citations:
- 1: https://pydantic.dev/docs/ai/api/pydantic-ai/messages/
- 2: https://pydantic.dev/docs/ai/models/overview/
- 3: https://github.com/pydantic/pydantic-ai/blob/8b9ac2bd/docs/tools.md
- 4: https://pydantic.dev/docs/ai/tools-toolsets/tools-advanced/
- 5: https://github.com/pydantic/pydantic-ai/blob/v1.70.0/pydantic_ai_slim/pydantic_ai/models/openai.py
- 6: https://pydantic.dev/docs/ai/core-concepts/output/
- 7: https://github.com/pydantic/pydantic-ai/blob/8b9ac2bd/docs/output.md
- 8: https://github.com/pydantic/pydantic-ai/releases/tag/v1.105.0
- 9: Run tools in emission order with
gracefuldefault;sequential=Trueis now a per-tool barrier; function-toolModelRetrysuppresses the output result pydantic/pydantic-ai#5339
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "get_agent_finish_reason\\(|finish_reason=.*tool" tests src -SRepository: lightspeed-core/lightspeed-stack
Length of output: 882
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '200,280p' src/utils/agents/query.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 2918
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "\"tool_call\"|\"tool_calls\"|\"tool_output\"|tool_output|tool_calls" src tests -SRepository: lightspeed-core/lightspeed-stack
Length of output: 28367
🌐 Web query:
pydantic-ai FinishReason enum values stop length content_filter tool_call error ModelResponse.finish_reason
💡 Result:
In Pydantic AI, the FinishReason type is a TypeAlias representing the reason a model stopped generating a response [1]. It is defined as a Literal string type with the following valid values [1]: stop: The model reached a natural stop point or a provided stop sequence [1][2]. length: The maximum number of tokens specified in the request was reached [1][2]. content_filter: The generation was stopped due to a content filter flag [1][2]. tool_call: The model initiated a tool call [1]. error: The generation ended due to an error (e.g., failure or cancellation) [1][3][4]. The ModelResponse class includes a finish_reason attribute (of type FinishReason | None) that records this information [1]. This field is intended to normalize diverse provider-specific finish reasons into a standardized format aligned with OpenTelemetry conventions [1][5]. Individual model providers (such as OpenAI, Anthropic, and xAI) implement internal mapping logic to convert their native API response finish reasons into these standardized Pydantic AI FinishReason values [3][6][7][8].
Citations:
- 1: https://pydantic.dev/docs/ai/api/pydantic-ai/messages/
- 2: Add
finish_reasontoModelResponsepydantic/pydantic-ai#1539 - 3: https://github.com/pydantic/pydantic-ai/blob/v1.70.0/pydantic_ai_slim/pydantic_ai/models/openai.py
- 4: https://github.com/pydantic/pydantic-ai/blob/main/pydantic_ai_slim/pydantic_ai/models/openai.py
- 5: Add
idandfinish_reasontoModelResponsepydantic/pydantic-ai#886 - 6: https://github.com/pydantic/pydantic-ai/blob/531d6da5/pydantic_ai_slim/pydantic_ai/models/xai.py
- 7: https://github.com/pydantic/pydantic-ai/blob/7f5214c6/pydantic_ai_slim/pydantic_ai/models/anthropic.py
- 8: https://github.com/pydantic/pydantic-ai/blob/7f5214c6/pydantic_ai_slim/pydantic_ai/models/openai.py
Potential ValueError when finish_reason is tool_call (or otherwise unknown)
get_agent_finish_reason() constructs AgentFinishReason(response.finish_reason) without a guard. pydantic-ai’s FinishReason includes "tool_call", which is not a member of the local AgentFinishReason enum, so this will raise ValueError instead of mapping to an LCS error.
Add explicit handling for tool_call (or catch ValueError and map unknown finish reasons to a safe fallback like AgentFinishReason.ERROR) rather than directly converting to the enum.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/agents/query.py` around lines 149 - 152, In
get_agent_finish_reason, avoid directly constructing
AgentFinishReason(response.finish_reason) because response.finish_reason can be
"tool_call" (or other unknown values) which will raise ValueError; update the
function to either explicitly check for "tool_call" on response.finish_reason
and return AgentFinishReason.ERROR, or wrap the enum conversion in a try/except
ValueError that returns AgentFinishReason.ERROR as a safe fallback, referencing
response.finish_reason and AgentFinishReason to locate the change.
There was a problem hiding this comment.
From pydantic-ai openai processor:
_CHAT_FINISH_REASON_MAP: dict[
Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'], FinishReason
] = {
'stop': 'stop',
'length': 'length',
'tool_calls': 'tool_call',
'content_filter': 'content_filter',
'function_call': 'tool_call',
}
_RESPONSES_FINISH_REASON_MAP: dict[Literal['max_output_tokens', 'content_filter'] | ResponseStatus, FinishReason] = {
'max_output_tokens': 'length',
'content_filter': 'content_filter',
'completed': 'stop',
'cancelled': 'error',
'failed': 'error',
}OpenResponses cannot result in tool_call finish reason.
There was a problem hiding this comment.
@asimurka, thanks for clarifying — you're right. Since this code targets the OpenAI Responses API (not Chat Completions), _RESPONSES_FINISH_REASON_MAP applies, and "tool_call" is not a possible finish reason from that path. All values it can emit — stop, length, content_filter, error — are covered by AgentFinishReason. My concern doesn't apply here.
✏️ Learnings added
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1880
File: src/utils/agents/query.py:149-152
Timestamp: 2026-06-09T07:36:38.363Z
Learning: In `src/utils/agents/query.py` (lightspeed-core/lightspeed-stack), `get_agent_finish_reason` and `AgentFinishReason` enum are designed specifically for the pydantic-ai OpenAI **Responses API** (not Chat Completions). The Responses API `_RESPONSES_FINISH_REASON_MAP` only produces `stop`, `length`, `content_filter`, and `error` as finish reasons — `tool_call` cannot occur from this API path. Therefore, a `ValueError` from `AgentFinishReason(response.finish_reason)` is not a concern in this context.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
0386bad to
dd950ae
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/agents/query.py`:
- Around line 74-86: Update the function docstrings in src/utils/agents/query.py
to follow the repo's Google-style docstring headings: replace "Args:" with
"Parameters:", ensure each docstring includes "Returns:" and "Raises:" sections
(and "Attributes:" for any classes), and keep the same descriptions and
parameter names/typing; specifically update the docstring for the top helper
that "Map agent run failures..." and the other helpers at the reported ranges
(lines ~104-112, 136-143, 156-164, 187-196, 228-242, 283-298) so they
consistently use "Parameters:", "Returns:", and "Raises:" headings and conform
to the project's docstring conventions.
- Around line 254-257: The loop over run_result.new_messages() currently
overwrites state.turn_summary.llm_response for each ModelResponse with text;
instead, append each message.text to preserve all fragments: when encountering a
ModelResponse in the loop, check state.turn_summary.llm_response (which may be
None/empty) and concatenate the new message.text (with a separator like a space
or newline) so earlier assistant text is not lost; update the code paths that
set llm_response to perform accumulation rather than assignment.
- Around line 311-313: The code casts responses_params.input to str before
calling agent.run but prepare_input() always returns a str and
retrieve_agent_response() has no production callers, so tighten the contract:
update retrieve_agent_response()'s signature to expect input: str (instead of
str | list[ResponseItem]) or add a lightweight runtime assertion inside
retrieve_agent_response() (e.g., assert isinstance(responses_params.input, str)
or raise TypeError) and then remove the cast at the call site where
build_agent(...) and agent.run(...) are invoked; reference
prepare_responses_params, prepare_input, retrieve_agent_response, build_agent,
and agent.run when making this change so the type narrowing/guard is applied
consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 14fcd2e8-5a1a-4dfc-9846-6c5a167f47ce
📒 Files selected for processing (2)
src/utils/agents/query.pytests/unit/utils/agents/test_query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/utils/agents/test_query.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/utils/agents/query.py
| RuntimeError: Re-raised when ``exc`` is a non-agent ``RuntimeError`` that is | ||
| not a recognized context-length failure. | ||
| """ | ||
| if isinstance(exc, AgentRunError): |
There was a problem hiding this comment.
can be rewritten using pattern matching, but it's a nitpick
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/agents/query.py (1)
211-218:⚠️ Potential issue | 🟠 MajorFix
llm_calls_totalundercount for multi-step agent runs
extract_agent_token_usage()setsTokenCounter.llm_calls = max(usage.requests, 1), but it callsrecording.record_llm_call()only once;record_llm_call()incrementsmetrics.llm_calls_total...inc()by +1 with no parameter for the actual count. IfRunUsage.requests > 1,llm_calls_totalwill underreport vs the returnedtoken_usage.llm_calls. Makerecord_llm_callaccept a count (or loop) and passtoken_counter.llm_calls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/agents/query.py` around lines 211 - 218, The code records token usage once but always increments llm_calls_total by 1, causing undercount when extract_agent_token_usage sets TokenCounter.llm_calls > 1; update recording.record_llm_call to accept a count parameter (e.g., count:int) or support being called multiple times, then call it with token_counter.llm_calls instead of a single call; ensure record_llm_token_usage still records tokens once and record_llm_call uses the passed count to increment metrics.llm_calls_total by that amount (or loop internally), updating the record_llm_call signature and all call sites accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/utils/agents/query.py`:
- Around line 211-218: The code records token usage once but always increments
llm_calls_total by 1, causing undercount when extract_agent_token_usage sets
TokenCounter.llm_calls > 1; update recording.record_llm_call to accept a count
parameter (e.g., count:int) or support being called multiple times, then call it
with token_counter.llm_calls instead of a single call; ensure
record_llm_token_usage still records tokens once and record_llm_call uses the
passed count to increment metrics.llm_calls_total by that amount (or loop
internally), updating the record_llm_call signature and all call sites
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a988433f-7e32-4643-82bc-297dc1e3d1ce
📒 Files selected for processing (1)
src/utils/agents/query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: radon
- GitHub Check: list_outdated_dependencies
- GitHub Check: integration_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/utils/agents/query.py
🔇 Additional comments (2)
src/utils/agents/query.py (2)
74-86: Still usingArgs:headings instead of the repo-requiredParameters:sections.This new module still hasn't been aligned with the required Google-style docstring headings. As per coding guidelines, "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes."
Also applies to: 106-114, 138-145, 158-166, 189-198, 230-244, 285-300
Source: Coding guidelines
256-259: Assistant text is still overwritten instead of accumulated.If
run_result.new_messages()yields multiple text-bearingModelResponses, earlier fragments are dropped and the summary only keeps the last one.
jrobertboos
left a comment
There was a problem hiding this comment.
LGTM but some of the e2e tests are failing, is that expected?
Retried job passed, failing test was unrelated anyway. |
Description
Adds utility functions for non-streaming query served by LLS pydantic-ai agent provider. For now, utilities are disconnected from query handler - that will be added in a separate PR.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests